Sphinx: Empowering Impala for Efficient Execution of SQL Queries on Big Spatial Data

نویسندگان

  • Ahmed Eldawy
  • Ibrahim Sabek
  • Mostafa Elganainy
  • Ammar Bakeer
  • Ahmed Abdelmotaleb
  • Mohamed F. Mokbel
چکیده

This paper presents Sphinx, a full-fledged open-source system for big spatial data which overcomes the limitations of existing systems by adopting a standard SQL interface, and by providing a high efficient core built inside the core of the Apache Impala system. Sphinx is composed of four main layers, namely, query parser, indexer, query planner, and query executor. The query parser injects spatial data types and functions in the SQL interface of Sphinx. The indexer creates spatial indexes in Sphinx by adopting a two-layered index design. The query planner utilizes these indexes to construct efficient query plans for range query and spatial join operations. Finally, the query executor carries out these plans on big spatial datasets in a distributed cluster. A system prototype of Sphinx running on real datasets shows up-to three orders of magnitude performance improvement over plain-vanilla Impala, SpatialHadoop, and PostGIS.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

ISP: Large-Scale In-memory Spatial Data Processing System (Demo Paper)

Huge amount of spatial data such as GPS locations is being generated everyday, which brings big challenges of efficient spatial data processing. Many existing big spatial data processing techniques are mostly based on disk-resident systems. They have not fully taken advantages of modern hardware, such as large main memory capacities and multi-core processors. In this paper, we demonstrate our I...

متن کامل

Impala: A Modern, Open-Source SQL Engine for Hadoop

Cloudera Impala is a modern, open-source MPP SQL engine architected from the ground up for the Hadoop data processing environment. Impala provides low latency and high concurrency for BI/analytic read-mostly queries on Hadoop, not delivered by batch frameworks such as Apache Hive. This paper presents Impala from a user’s perspective, gives an overview of its architecture and main components and...

متن کامل

Runtime Code Generation in Cloudera Impala

In this paper we discuss how runtime code generation can be used in SQL engines to achieve better query execution times. Code generation allows query-specific information known only at runtime, such as column types and expression operators, to be used in performance-critical functions as if they were available at compile time, yielding more efficient implementations. We present Cloudera Impala,...

متن کامل

Optimization of Common Table Expressions in MPP Database Systems

Big Data analytics often include complex queries with similar or identical expressions, usually referred to as Common Table Expressions (CTEs). CTEs may be explicitly defined by users to simplify query formulations, or implicitly included in queries generated by business intelligence tools, financial applications and decision support systems. In Massively Parallel Processing (MPP) database syst...

متن کامل

Optimization of spatial join using constraints based- clustering techniques

Spatial joins are used to combine the spatial objects. The efficient processing depends upon the spatial queries. The execution time and input/output (I/O) time of spatial queries are crucial, because the spatial objects are very large and have several relations. In this article, we use several techniques to improve the efficiency of the spatial join; 1. We use R*-trees for spatial queries sinc...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017